Automatically Detecting and Attributing Indirect Quotations
نویسندگان
چکیده
Direct quotations are used for opinion mining and information extraction as they have an easy to extract span and they can be attributed to a speaker with high accuracy. However, simply focusing on direct quotations ignores around half of all reported speech, which is in the form of indirect or mixed speech. This work presents the first large-scale experiments in indirect and mixed quotation extraction and attribution. We propose two methods of extracting all quote types from news articles and evaluate them on two large annotated corpora, one of which is a contribution of this work. We further show that direct quotation attribution methods can be successfully applied to indirect and mixed quotation attribution.
منابع مشابه
A Joint Model for Quotation Attribution and Coreference Resolution
We address the problem of automatically attributing quotations to speakers, which has great relevance in text mining and media monitoring applications. While current systems report high accuracies for this task, they either work at mentionlevel (getting credit for detecting uninformative mentions such as pronouns), or assume the coreferent mentions have been detected beforehand; the inaccuracie...
متن کاملTowards automatic detection of reported speech in dialogue using prosodic cues
The phenomenon of reported speech – whereby we quote the words, thoughts and opinions of others, or recount past dialogue – is widespread in conversational speech. Detecting such quotations automatically has numerous applications: for example, in enhancing automatic transcription or spoken language understanding applications. However, the task is challenging, not least because lexical cues of q...
متن کاملAudio quotation marks for natural language understanding
Detecting the presence of quotations in speech is a difficult task for automatic natural language understanding. This paper presents a study on the correlation between three prosodic features present in a voice command and the presence or absence of quotations. These features consist of intra-word pause durations, F0 reset and F0 continuity. A combination of lexical and prosodic extraction tool...
متن کاملDirect Reported Speech in Multilingual Texts: Automatic Annotation and Semantic Categorization
We propose an application for the automatic identification and categorization of quotations. The categorization is based on a semantic map of enunciative modalities. The texts are treated in three languages: Arabic, Korean and French. 1. General presentation and related works Automatic identification of quotations using natural language processing (NLP) is now significantly growing in recent st...
متن کاملDetection of quotations and inserted clauses and its application to dependency structure analysis in spontaneous Japanese
Japanese dependency structure is usually represented by relationships between phrasal units called bunsetsus. One of the biggest problems with dependency structure analysis in spontaneous speech is that clause boundaries are ambiguous. This paper describes a method for detecting the boundaries of quotations and inserted clauses and that for improving the dependency accuracy by applying the dete...
متن کامل